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In metazoans, the nuclear lamina is thought to play an important role in the spatial organization of interphase chro- 
mosomes, by providing anchoring sites for large genomic segments named lamina-associated domains (LADs). Some of 
these LADs are cell-type specific, while many others appear constitutively associated with the lamina. Constitutive LADs 
(cLADs) may contribute to a basal chromosome architecture. By comparison of mouse and human lamina interaction 
maps, we find that the sizes and genomic positions of cLADs are strongly conserved. Moreover, cLADs are depleted of 
synteny breakpoints, pointing to evolutionary selective pressure to keep cLADs intact. Paradoxically, the overall sequence 
conservation is low for cLADs. Instead, cLADs are universally characterized by long stretches of DNA of high A/T content. 
Cell-type specific LADs also tend to adhere to this "A/T rule" in embryonic stem cells, but not in differentiated cells. This 
suggests that the A/T rule represents a default positioning mechanism that is locally overruled during lineage commit- 
ment. Analysis of paralogs suggests that during evolution changes in A/T content have driven the relocation of genes to 
and from the nuclear lamina, in tight association with changes in expression level. Taken together, these results reveal that 
the spatial organization of mammalian genomes is highly conserved and tightly linked to local nucleotide composition. 

[Supplemental material is available for this article.] 



The spatial architecture of interphase chromosomes is thought 
to be important for gene regulation and genome maintenance 
(Misteli and Soutoglou 2009; Kind and van Steensel 2010). How- 
ever, the organization of chromosomes inside the nucleus is still 
poorly understood. While folding of the chromatin fiber is to some 
degree stochastic, most genomic loci are nonrandomly positioned 
with respect to each other and relative to fixed landmarks in the 
nucleus. Two classes of biochemical mechanisms are thought to 
contribute to this nonrandom positioning (van Steensel and 
Dekker 2010). First, a variety of protein complexes mediate specific 
physical associations between linearly distant loci. Second, specific 
loci may be anchored to large nuclear structures that serve as 
scaffolds. One of the main candidates for such a scaffold function 
is the nuclear lamina (NL). 

The NL is a filamentous structure of proteins lining the inner 
nuclear membrane of metazoans. Lamins are intermediate fila- 
ment proteins that form the major component of the NL. In 
mammals, these are represented by lamin A/C (A- type) and lamin 
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Bl and B2 (B-type). By DamID of B-type lamins, we have previously 
shown that genomes of organisms evolutionarily as distant as fruit 
fly, mouse, and man have large nuclear lamina-associated do- 
mains (LADs) (Pickersgill et al. 2006; Guelen et al. 2008; Peric- 
Hupkes et al. 2010; van Bemmel et al. 2010). LADs are generally 
very large regions (typically hundreds of kilobases) and collectively 
cover —35% of the genome. Genes within LADs are generally 
transcriptionally inactive. Comparison of mouse embryonic stem 
(ES) cells and differentiated cell types revealed that hundreds of 
genes interact with the NL in a cell-type specific (facultative) 
manner. These genes lose NL association upon or prior to their 
activation during differentiation, or gain NL association if they are 
no longer expressed (Peric-Hupkes et al. 2010). 

Despite these dynamics, there appear to be many regions in 
the genome that interact with the NL in a cell type independent 
manner. Such constitutive LADs (cLADs) may provide chromo- 
somes with a basic "backbone" structure that is shared among most 
or all cell types. Insight into the nature of cLADs is therefore of 
importance to our understanding of the mechanisms that de- 
termine the spatial architecture of chromosomes. Here, we report 
a detailed analysis of cLADs. We find that they are highly con- 
served between mouse and human, indicating that they are 
functionally important. Sequence analysis reveals that cLADs can 
be predicted based on their high A/T content. Furthermore, we 
demonstrate that divergence of spatial positioning of paralogous 
genes strongly correlates with a divergence in their overall A/T 
content. We propose that A/T-rich stretches in mammalian genomes 
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serve as NL-anchoring sequences that form a structural backbone 
of interphase chromosomes. 

Results 

Genomic regions of constitutive NL interactions have 
distinctive properties 

To compare regions that exhibit constitutive and facultative NL 
interactions, we used previously reported genome-wide DamID 
lamin Bl interaction data from four different mouse cell types: 
embryonic stem cells (ESCs), neural precursor cells (NPCs), as- 
trocytes (ACs), and embryonic fibroblasts (MEFs) (Peric-Hupkes 
et al. 2010). The DamID data were obtained using genomic tiling 
arrays with a median probe spacing of —1.2 kb. To compare NL 
interactions among the four cell types, we first used a hidden 
Markov model (HMM) to classify all microarray probes in each of 
the cell types as either LAD or inter- LAD (Fig. 1A). Next, we de- 
fined constitutive LADs (cLADs) as regions that are LAD in all four 
cell types; constitutive inter-LADs (ciLADs) as regions that are 
inter-LAD in all four cell types, and facultative LADs (fLADs) as 
regions for which LAD-status is cell type dependent (Fig. IB). 
Furthermore, we refer to cLADs and ciLADs collectively as "con- 
stitutive regions" because they are both invariant between cell 
types. We find that —71% of the genome is organized in a con- 
stitutive manner, consisting of roughly equal parts of cLADs 
(33%) and ciLADs (38%) (Fig. 1C). We note that these definitions 
are operational; a comprehensive definition would require maps 
of genome-NL interactions in all possible cell types. Neverthe- 
less, as shown below, the operational definitions used here pro- 
vide useful insights. 



We have previously shown that LADs are relatively gene poor 
in multiple species (Pickersgill et al. 2006; Guelen et al. 2008; Peric- 
Hupkes et al. 2010; van Bemmel et al. 2010). cLADs are also very 
gene poor, even more so than fLADs (Fig. ID). Consistent with 
this, gene deserts are substantially enriched in cLADs (Fig. IE). 
Furthermore, LINE elements (Fig. IF), of which most are LI ele- 
ments, and simple A/T-rich elements (Fig. 1G) are enriched, while 
SINE elements are depleted in cLADs (Fig. 1H). Importantly, cLADs 
and fLADs differ for all five genomic features (Fig. 1D-H), in- 
dicating that these two classes of LADs are distinct even though 
they were defined based on four cell types only. 

cLADs are conserved across species 

We reasoned that if cLADs are important, then they are likely to be 
conserved across species. To test this, we generated a genome-wide 
map of NL interactions in human ESCs and compared it with the 
previously derived map from mouse ESCs (Peric-Hupkes et al. 
2010). After correction for divergence in synteny, we observed 
a remarkable similarity of the NL interactions between mouse and 
human ESCs (Fig. 2A,B; Supplemental Fig. 1), with an overall 
concordance of 83%. Remarkably, this concordance is even higher 
in constitutive regions (91%) and relatively low in fLADs (67%, 
which is only modestly higher than the 52% concordance ex- 
pected by random chance, although this is also statistically signifi- 
cant) (Table 1). The evolutionary conservation of NL interactions is 
not restricted to ESCs, because a separate comparison between fi- 
broblasts from both species yielded similar results (Supplemental 
Fig. 2; Table 1). To further investigate conservation of constitutive 
regions between mouse and human, we also generated DamID 
lamin Bl maps for the human HT1080 fibrosarcoma cell line and 
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Figure 1 . A core architecture of genome-nuclear lamina interactions. (A) Regions classified by HMM as lamin Bl interacting in mouse ESC (orange), NPC 
(blue), astrocyte (magenta), and MEF (dark green) cell cultures, shown for a 40-Mb region of chromosome 2. (B) Regions common to all cell types are termed 
cLADs (mustard) and ciLADs (cyan), with dynamic regions termed fLADs (gray). (C) Mean profile of lamin Bl association in the cell types indicated in A. Colors 
as in B. The core architecture of constitutive regions makes up —71% of the genome, with 33% in cLAD and 38% in ciLAD regions. (D-H) Percent coverage of 
cLAD (mustard), fLAD (gray), and ciLAD regions for genes (D), gene deserts (f ), LINE elements (F), simple A/T-rich elements (C), and SINE elements (H). Error 
bars indicate two standard deviations around mean coverage values for random regions obtained through circular permutations. 
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Figure 2. Core architecture is highly conserved between mouse and human. (A) Map of NL interactions in mouse ESCs for chromosome 2 ([mustard] 
cLADs; [cyan] ciLADs; [gray] fLADs), and (B) in human ESCs for the corresponding syntenic regions (each color denotes a different human chromosome). 
(C) Example of co-occurrence of synteny breakpoint and LAD border. Color scheme as in A and B. (D) Ratio of observed and expected occurrence of 
breakpoints in cLADs, ciLADs, or their border regions. (£) Cartoon representations of homotypic and heterotypic junctions; breakpoints indicated by 
dashed gray lines. All breakpoints in constitutive regions (n = 1 43) were scored based on whether they coincide with an LAD in mouse or human. 



found that as much as 71% of mouse constitutive regions are also 
constitutive across human ESCs, fibroblasts, and HT1080 cells. 
Taken together, the pattern of cLADs and ciLADs is highly con- 
served between mouse and human, which are separated by —75 
million years in the evolutionary tree. 

The cell types that we studied may have similar gene 
expression repertoires between mouse and human. Because 
gene expression is overall inversely correlated to NL association 
(Peric-Hupkes et al. 2010), it is possible that the observed evo- 
lutionary conservation of NL interaction patterns is merely a 
result of conserved gene expression patterns. To rule out this 
possibility, we investigated whether the conservation of NL as- 
sociation is linked to the presence of genes. This revealed that 
this is not the case: The interspecies concordance of NL in- 
teractions does not decrease when genie regions are removed 
from the analysis (Table 1). Thus, conserved gene expression 
patterns cannot account for the remarkable conservation of the 
pattern of cLADs and ciLADs. 

The high overall conservation of cLADs suggests that chro- 
mosomal rearrangements during evolution that disrupt this orga- 
nization would be deleterious. Such deleterious effects would be 
minimized if rearrangement breakpoints are restricted to cLAD/ 
ciLAD borders, thereby keeping both cLADs and ciLADs mostly 
intact. To test this prediction, we studied 143 loci representing 
chromosome rearrangements between human and mouse, as evi- 
denced by transitions between mapped chromosomes (i.e., change 
of colors in Fig. 2B) in constitutive regions. Strikingly, these 
breakpoints occur preferentially close to cLAD/ciLAD borders and 
are significantly depleted from cLADs (Fig. 2C,D). This suggests an 
overall importance of cLAD integrity, even though cLADs have 
relatively low gene densities and thus may be expected to be under 
low selective pressure. Although this analysis is restricted to con- 
stitutive regions only, the same analysis for the full genome yields 
very similar results (Supplemental Fig. 3 A). 



Although synteny breakpoints are significantly depleted from 
cLADs (24 out of 143, P < 3 X 10~ 5 ), they are not completely ab- 
sent. We asked whether these tolerated junctions are homotypic 
(joining two LAD segments) rather than heterotypic (joining a 
LAD segment to a non-LAD segment). Indeed, we found that 
breaks that do occur within cLADs are not likely to cause hetero- 
typic rearrangement junctions, compared with those outside LADs 
(P = 1.126 X 10" 7 ) (Fig. 2E). When also incorporating facultative 
regions in the analysis, the same conclusion holds (Supplemental 
Fig. 3B). Because homotypic rather than heterotypic junctions 
would prevent fragmentation into increasingly smaller LADs and 
thereby loss of the overall architecture, these observations suggest 
that disruptions of the cLAD core architecture are under negative 
selection pressure. 

cLAD sequences are characterized by high A/T content 

We next asked whether this strong evolutionary conservation of 
constitutive NL interactions is also reflected in conservation of the 
underlying genomic sequence. To address this, we compared av- 
erage basewise conservation scores (derived from sequence align- 
ments of 28 placental mammals) (Pollard et al. 2010) in cLADs, 
ciLADs, and fLADs. We restricted this analysis to intergenic and 



Table 1. Concordance scores between mouse and human cells 





ESCs (random) 


Fibroblasts (random) 


Genome 

Constitutive regions 
Facultative regions 
Nongenic probes 


83.06% (50.48%) 
91.03% (53.99%) 
67.08% (52.47%) 
83.03% (50.57%) 


73.84% (49.74%) 
77.44% (51.02%) 
66.74% (51.11%) 
72.57% (50.82%) 



All reported concordance scores are significantly higher than random with 
P < 1 0 5 , as assessed by circular permutation. 
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intronic regions because coding sequences are subject to a variety 
of selective pressures that could confound this analysis. Surpris- 
ingly, the results show that both intergenic and intronic sequences 
in cLADs are generally less well conserved than in ciLADs and 
fLADs (Supplemental Fig. 4). Thus, the strong overall conservation 
of cLAD positions is not accompanied by strong conservation of 
cLAD nucleotide sequence. 

We reasoned that, despite the poor overall sequence conser- 
vation, there may still be specific sequence motifs dispersed 
throughout cLADs that could mediate NL interactions. To in- 
vestigate this, we systemically searched for sequence motifs of 
length 1-5 bp that could discriminate cLADs from ciLADs. For this 
purpose, we selected 1000 cLAD and 1000 ciLAD regions of 1 kb 
from the mouse genome that show the most extreme lamin Bl 
association across all four mouse cell types. Furthermore, we ex- 
cluded all genie regions, because those may introduce a bias in the 
analysis. Strikingly, the result of this motif search revealed that A/T 
content alone is sufficient to discriminate cLADs from ciLADs with 
94% accuracy (10-fold cross-validation accuracy). cLADs and ciLADs 
have a mean A/T content of —63% and 54%, respectively, with very 
little overlap in the distributions (Fig. 3 A). 

The predictive accuracy of the cLAD versus ciLAD classifier 
increases slightly, albeit nonsignificantly, when the occurrence of 
dinucleotides is used as a predictive feature (95.85%) but does not 
significantly increase further when longer /c-mers are used (Fig. 3B). 
It is therefore unlikely that recurrent sequence motifs larger than 
1-2 bp explain NL interactions of constitutive regions. In fact, 
a survey of the enrichment of individual dinucleotides in cLADs 
versus ciLADs shows that the main driving force behind the clas- 
sification accuracies is simple A/T content (Fig. 3C). 

We considered two potential confounding factors in these 
analyses. First, we ruled out that the high abundance of AT-rich 
LINE-1 elements in cLADs explained the classification result. Re- 
moval of all LINE-1 elements did not affect the classification ac- 
curacy (data not shown). Second, we asked whether A/T content 
might simply correlate with gene density, which is low in cLADs 
and high in ciLADs. Within ciLADs (which have the broadest di- 
versity of gene density), we found no significant correlation be- 
tween A/T content and gene density (see Methods), indicating that 
this is not a major confounding factor. In summary, intergenic 
regions in cLADs almost invariably have a high A/T content, while 
those in ciLADs have a consistently low A/T content. 



The observed segmentation of the genome into ciLADs and 
cLADs of low and high A/T content, respectively, reminded us of 
the long-known partitioning of mammalian genomes into isochores, 
which are long DNA segments of relatively homogenous base 
composition (Bernardi et al. 1985; Eyre-Walker and Hurst 2001). 
Indeed, comparison of the mouse cLAD/ciLAD pattern to that of 
isochores (Costantini et al. 2009) revealed a strikingly high con- 
cordance of 93% (Supplemental Fig. 5). 

A/T content predicts fLADs only in ESCs 

Because of the unique pluripotent nature of ESCs, it has been 
suggested that these cells contain a set of default NL interactions, 
together forming a "basal" state, that are progressively modified 
during subsequent differentiation steps (Peric-Hupkes et al. 2010). 
In other words, NL interactions in ESCs may be driven by a default 
mechanism; upon differentiation, this default mechanism could 
be partially overruled by cell-type specific mechanisms that re- 
locate fLADs to or from the NL. We wondered whether the A/T rule 
that we identified for cLADs could constitute the default mecha- 
nism in this model. If so, then fLADs should adhere to the A/T rule 
in ESCs, but not in differentiated cells. Indeed, the A/T rule predicts 
the NL interaction status of fLADs in ESCs with an accuracy of 71% 
(Fig. 4). In contrast, in the three differentiated cell types, this 
accuracy drops to 30%-40%. The latter numbers are less than 
expected by chance, suggesting that differentiated cells have one 
or more mechanisms that overrule A/T-driven interactions of 
fLADs. These results are consistent with a model in which A/T- 
driven NL interactions of the genome constitute a default prin- 
ciple that is locally overruled by cell-type specific mechanisms. 

A/T-content divergence of paralogous genes is linked 
to differential NL interaction 

If A/T content is a determinant of NL interactions, then during 
evolution some genes may have acquired a high A/T content to 
enhance their NL associations, while other genes may have be- 
come C/G rich in order to keep them in the nuclear interior. Be- 
cause paralogous genes are derived from a common ancestor, they 
are particularly useful to investigate this model. We therefore 
compared differences in A/T content of paralogous genes with 
differences in NL association. Remarkably, paralogous genes with 
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Figure 3. A/T content is a strong predictor of constitutive lamina association. (A) A/T content versus mean mouse lamin Bl association for a random 
sample of 10,000 defined cl_AD, fl_AD, and cil_AD regions (gray dots). (Black dots) The regions used for classification (1000 per group). (Dashed line) 
Genome-wide average (58%). (B) Tenfold cross-validation classification accuracies of classifiers using /c-mers as predictive features, with k = 1 ... 5. Bars 
show mean accuracies, with error bars indicating one standard deviation. (C) Log 2 ratios of occurrence counts in cLAD versus ciLAD regions for each 
dinucleotide. Bar shading indicates the A/T content of dinucleotides. 
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Figure 4. A/T-content rule during differentiation. Adherence to A/T- 
content rule by regions that show cell-type specific lamin interactions, as 
measured by classification accuracy using the classifier (k = 1) trained on 
cLAD and ciLAD regions. Bars show mean bootstrapped classification ac- 
curacies, with error bars indicating standard deviation. (Dotted horizontal 
line) Random performance. 



a clear difference in A/T content generally differ strongly in their 
NL interactions, in a manner consistent with the A/T-content 
rule (Fig. 5A; Supplemental Fig. 6A,C,E, P < 1 X 10" 4 ). Also, dif- 
ferential NL interaction levels of paralogous genes correlate with 
differences in expression status (Fig. 5B; Supplemental Fig. 
6B,D,F, P < 1 X 10~ 4 ). More specifically, a gene that underwent an 
increase in A/T content, relative to its non-lamina-associated 
paralogous gene, typically shows enhanced association with the 
NL, as well as a strongly decreased level of expression. These re- 
sults suggest that the A/T-content rule may be an important 
evolutionary tool to modify gene localization and activity. 
However, it cannot be ruled out that genes that move to LADs 
become more A/T rich once they reside there, due to regional 
biases in DNA replication or repair (Eyre- Walker and Hurst 2001). 
Regardless of the underlying evolutionary mechanism, this 



analysis of paralogs further underscores the overall relationship 
between NL interactions and A/T content. 

A/T content also correlates with NL interactions in other 
metazoans 

We wondered to what extent NL interactions are also corre- 
lated with A/T content in other species. We investigated this for 
Caenorhabditis elegans and Drosophila melanogaster, for which NL 
interactions were mapped in embryos and cultured embryonic 
cells, respectively (Ikegami et al. 2010; van Bemmel et al. 2010). In 
both species, LADs have significantly higher A/T content than 
inter-LAD regions, although the magnitudes of the differences are 
relatively small (Supplemental Fig. 7). Not enough data are avail- 
able to discriminate cLADs from fLADs in these species. 

No detectable role for POU2F1 in genome-NL anchoring 

The presence of a clear sequence signature (i.e., high A/T content) 
in cLADs suggested that one or more proteins in the NL may bind 
directly to A/T-rich sequences and thereby tether cLADs to the NL. 
Previously, we reported that human LADs are enriched for binding 
motifs of POU2F1 (Guelen et al. 2008), which is a ubiquitous DNA- 
binding factor (also known as Octl) that is localized at the nuclear 
periphery and interacts with lamin Bl (Malhas et al. 2009). Since 
all published POU2F1 -binding motifs are relatively A/T rich (con- 
sensus sequence: TATGCAAAT), it is possible that this protein 
mediates NL association by binding both its genomic targets and 
lamins simultaneously. To test this, we generated genome-wide maps 
of lamin Bl interactions in wild-type and POU2Fl~ / ~ MEFs (Wang 
et al. 2004). The results indicate that genome-NL interactions are 
largely independent of POU2F1, with up to 96% concordance for 
constitutive regions between wild-type and POU2F1 -null cells (Sup- 
plemental Fig. 8A-C). Furthermore, plotting of changes in lamin Bl 
DamID signals around POU2F1 -binding motifs failed to detect any 
local effect of POU2F1 (Supplemental Fig. 8D). These results indicate 
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Figure 5. Correlation of A/T content with behavior of paralogous gene pairs. Panels show mean NL interaction scores in mouse ES cells for all known 
paralog pairs. Each dot represents a paralog pair, colored by percent difference in A/T content (A) and log 2 difference in expression level (B). The 
assignment of paralog 1 or 2 is redundant, hence the symmetry across the diagonal. Expression data for mouse ES cells are taken from Mikkelsen et al. 
(2007), based on Affymetrix arrays, which is why for a number of genes no value is reported in B. 
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that POU2F1 does not play an essential role in genome-NL in- 
teractions, at least not in MEFs. 

Lamin Bl and Iamin A interact with largely the same 
genomic regions 

Most cells express both A- and B-type lamins. Several lines of evi- 
dence indicate that these have in part distinct functions (for re- 
view, see Dechat et al. 2010). For example, A- type lamins are es- 
sentially absent in ESCs; mutations in lamin A/C can cause a broad 
range of human disorders not associated with mutations in B-type 
lamins; A- and B-type lamins have been reported to form spatially 
distinct meshworks inside the NL; and finally, while B-type lamins 
are primarily restricted to the NL, A-type lamins are typically also 
present in the nuclear interior. These differences raised the possi- 
bility that A- and B-type lamins interact with distinct parts of 
the genome. Because in vivo interactions so far have only been 
mapped for lamin Bl (Guelen et al. 2008; Peric-Hupkes et al. 2010; 
Handoko et al. 2011), we used DamID to generate genome-wide 
maps of lamin A interactions. We focused on NPCs and ACs, which 
express lamin A endogenously. 

The resulting lamin A binding profiles are very similar to 
those of lamin Bl, with an overall concordance of 94%-95% be- 
tween lamins A and Bl in both cell types (Fig. 6). In particular, the 
interactions for constitutive regions are virtually identical at 99% 
and 98% concordance in NPCs and ACs, respectively (Fig. 6C,F). 
Within fLADs the concordance between lamin A and lamin Bl 
binding is somewhat lower but still substantial (83%-86%). To 
assess whether the strong overall agreement in binding between 
lamin A and lamin Bl is restricted to the tested mouse cell types, we 
generated and compared genome-wide DamID maps for both 
proteins in human HT1080 fibrosarcoma cells. This again yielded 
a very high concordance between the two lamins (—97%) (Sup- 
plemental Fig. 9). Taken together, these findings demonstrate that 
cLADs interact robustly with both lamin A and lamin Bl in mouse 



and human, while fLADs may have slight preferences for one 
lamin type over the other. 

Discussion 

Using maps of genome-NL interactions in various mouse and 
human cell types, we conducted a detailed analysis of cell type 
invariant (constitutive) NL interactions. We find that the positions 
of cLADs along the genome are highly conserved between mouse 
and human. This contrasts with the overall poor conservation of 
transcription factor binding sites, which has been reported to be 
<15% between mouse and human (Schmidt et al. 2010; Soccio 
et al. 2011), and the moderate conservation (-30%) of CTCF- 
binding sites (Schmidt et al. 2012). Interestingly, recent systematic 
mapping of chromatin contacts in human and mouse cells iden- 
tified hundreds of megabase-sized domains that are conserved by 
>50%. These "topological domains" overlap in part with LADs 
(Dixon et al. 2012; Nora et al. 2012), but limited genome coverage 
or a >10-fold difference in the mapping resolution precludes more 
detailed comparisons. 

Our analyses indicate that synteny breaks are depleted from 
cLADs, despite the fact that cLADs are gene poor. We cannot en- 
tirely rule out that LAD border regions are somehow intrinsically 
fragile and thus have accumulated synteny breakpoints, or that 
LADs are somehow less susceptible to DNA breaks. Furthermore, 
those synteny breaks that do occur in LADs tend to favor homo- 
typic junctions, which leads to preservation of LADs as large units. 
Possibly, peripheral positioning of cLADs requires cooperative in- 
teractions of long A/T-rich stretches of DNA with NL components. 
Accordingly, fragmentation of such stretches by genomic rear- 
rangements might cause weakening of NL association and thereby 
loss of spatial organization. Clustering of cLADs in the peripheral 
nuclear compartment may promote homotypic rearrangements 
and thereby preserve the large sizes of cLADs. It is also possible that 
heterotypic junctions lead to repression of genes in ciLADs near 
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Figure 6. The core architecture is shared among lamin A and lamin Bl . Genome-NL interaction profiles for mouse chromosome 2 as assayed by DamID 
of lamin A (A,D) and lamin Bl (B,E) in NPCs (A,B) and astrocytes (D,£). Data in B and E are from Peric-Hupkes et al. (201 0). (Black) Constitutive regions; 
(gray) facultative regions. (C,F) Concordance scores are shown for all, constitutive (cl_AD and cil_AD) and facultative regions. 
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the junction borders as a result of the heterochromatic nature of 
cLADs. Regardless of the underlying mechanisms, our results in- 
dicate that the positions and size of cLADs have been remarkably 
conserved. 

Despite this conservation of size and positions, noncoding 
DNA sequence in cLADs is relatively poorly conserved. This ap- 
pears to match well with an earlier observation that DNA se- 
quences in compacted chromatin are less conserved than in 
decondensed chromatin (Prendergast et al. 2007), assuming that 
the transcriptionally inactive cLADs harbor primarily compacted 
chromatin. 

A defining characteristic of cLADs is their high A/T content. 
Likewise, paralogous genes that differ in their NL associations 
show a concomitant difference in A/T content. These observations 
reveal a tight link between spatial organization and simple nucle- 
otide composition of genomic regions. The molecular mechanisms 
that might be responsible for the peripheral positioning of A/T-rich 
DNA are presently unknown. Even though POU2F1 was a likely 
candidate because it binds an A/T-rich motif and is known to as- 
sociate with the NL, we found that MEFs lacking POU2F1 exhibit 
no detectable changes in NL-genome interactions. Lamins them- 
selves are prime candidates to mediate the anchoring of cLADs, 
because they have been reported to interact with A/T-rich se- 
quences in vitro (Luderus et al. 1994). However, a recent study of 
mouse ES cells lacking all lamins found few changes in the ex- 
pression of genes in LADs (Kim et al. 2011). It is possible that other 
NL-associated proteins provide redundancy for adherence of A/T- 
rich stretches. Furthermore, epigenetic mechanisms, e.g., involv- 
ing certain histone modifications, may also play a role. Finally, 
nuclear organization is likely also dependent on other inter- and 
intrachromosomal contacts (e.g., transcription factories, nucleolar 
interactions, etc.). We emphasize that a potential causal relation- 
ship between A/T richness and peripheral positioning needs to be 
addressed further. 

We found that the cLAD/ciLAD segmentation coincides 
largely with isochore distribution. Several explanations have been 
suggested for the isochore organization of vertebrate genomes 
(Eyre-Walker and Hurst 2001), but a role for A/T-rich isochores as 
NL-anchoring sites has not been considered so far. Our results raise 
the interesting possibility that the evolution of isochores is linked 
to the spatial organization of chromosomes in the nucleus. 

Centromeric satellite repeats have A/T contents ~63%-66%, 
which is in the range of cLADs. Indeed, in most mouse and human 
cell types, centromeres show a preferential localization near the NL 
(Weierich et al. 2003; Wiblin et al. 2005). An exception to this is 
human ES cells (Wiblin et al. 2005; Bartova et al. 2008). Centro- 
meres have a unique chromatin composition (Bergmann et al. 
2012), which may modulate NL interactions. Human telomeres are 
not typically found at the periphery (Luderus et al. 1996; Weierich 
et al. 2003; Ramirez and Surralles 2008), consistent with their rel- 
atively low (50%) A/T content. Mouse telomeres, which have the 
same sequence, can be peripheral to some degree (Vourc'h et al. 
1993; Weierich et al. 2003), but much of this is accounted for by 
the telocentric nature of mouse chromosomes (i.e., half of the 
telomeres are linked to pericentric regions, which tend to be pe- 
ripheral). Telomeric repeats and centromeric satellite repeats lack 
GATC sequence motifs, and therefore their NL interactions cannot 
be probed by DamlD. 

Several of the findings we describe here are reminiscent of 
what has been found for replication timing (RT) of DNA in S phase. 
Analogous to LADs, late-replicating domains are generally gene 
poor, A/T rich, and often located at the nuclear periphery (Hiratani 



et al. 2009). Recently it has been shown that there are cell- type in- 
variant (constitutive) and cell- type specific (facultative) RT domains 
(Hiratani et al. 2008, 2010) and that RT patterns are conserved for 
syntenic regions between mouse and human (Ryba et al. 2010; Yaffe 
et al. 2010). However, LADs and late-replicating domains overlap 
only partially (Peric-Hupkes et al. 2010), and there are also a few 
other notable differences. First, we observed an enrichment of 
synteny breakpoints on cLAD-ciLAD borders, whereas RT domains 
seem to shift slightly across synteny breaks (Yaffe et al. 2010). Sec- 
ond, RT becomes more tightly correlated to A/T content as cells 
differentiate (Hiratani et al. 2010), which contrasts our observation 
that fLADs adhere to the A/T-content rule in ESCs but not in dif- 
ferentiated cell types. These discrepancies underscore that RT orga- 
nization and LAD organization are only partially linked. 

Our results indicate that lamins A and Bl have largely over- 
lapping binding patterns along the genome, although some subtle 
differences may exist in fLADs. Our data therefore imply that most 
loci do not differentiate between microdomains in the NL con- 
sisting of only one type of lamin (Shimi et al. 2008). Moreover, it 
appears that the pool of lamin A present in the nuclear interior 
(Broers et al. 1999; Dechat et al. 2000) does not bind to a distinct set 
of genomic loci. Possibly, internal lamin A does not interact with 
the genome at all, which is consistent with the high mobility of 
this pool (Broers et al. 1999). Alternatively, LADs may detach from 
the NL in a stochastic manner, and this internally located set of 
LADs may still interact with lamin A. Both scenarios would yield 
the high similarity of DamlD maps for lamin A and lamin Bl as we 
observed here. Future studies should be aimed at elucidating the 
molecular mechanisms that drive the interactions of specific ge- 
nomic regions with NL components. 

Methods 

Data sets 

DamlD maps of lamin Bl in mouse ESCs, ACs, NPCs, and MEFs 
were taken from Peric-Hupkes et al. (2010), and of lamin Bl in 
human Tig3 fibroblasts from Guelen et al. (2008). Coordinates of 
LADs in fly and worm were from van Bemmel et al. (2010) and 
Ikegami et al. (2010), respectively. For this study, we generated new 
DamlD maps of lamin Bl in human ESCs and HT1080 cells and in 
mouse POU2Fl~ / ~ and matching wild-type MEFs; and of lamin A 
in human HT1080 cells and in mouse NPCs and ACs. 



Cell culture and DamlD 

Human embryonic stem cell line SHEF2 was cultured under feeder- 
free conditions as described (Braam et al. 2008), with the modifi- 
cation that they were maintained in mTeSR medium (Stem Cell 
Technologies). Viral transduction with either Dam-lamin Bl or 
Dam-only virus was performed as follows. Almost confluent cells in 
a six-well plate were washed twice with DMEM-F12 (GIBCO BRL), 
trypsinized for 120 sec at 37°C; following two additional washes, 
cells were resuspended thoroughly to single cells in mTeSR medium. 
Cells were then collected by centrifugation for 5 min at 350g and 
directly resuspended in a mixture containing 400 jjlL of virus (in 
mTeSR), 1.6 mL of mTeSR, and 8 |xg/mL Polybrene. Cells were 
seeded at high density (typically 1:1, never exceeding 1:2) onto a six- 
well plate. Medium was changed every day, and genomic DNA was 
harvested 72 h after plating. A lentiviral Dam-lamin A-expressing 
construct was made through Gateway cloning as described in Vogel 
et al. (2006), and correct subnuclear localization to the NL of the 
fusion protein was confirmed by immunofluorescence microscopy 
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using an antibody against the v5 epitope-tag (data not shown). 
POU2Fl~ / ~ MEFs and matching isogenic wild- type MEFs were 
obtained from D. Tantin and cultured as described (Kang et al. 
2009). All subsequent DamID steps were performed as described 
before (Peric-Hupkes et al. 2010). 

NimbleGen microarray design 

Microarrays were custom-designed for DamID purposes. A selec- 
tion of NimbleGen whole-genome, high-density, ChlP-on-chip 
probe sequences was made to attain the desired probe spacing and 
exclude probes containing GATC motifs. The tiling arrays cover 
the entire nonrepetitive genome, with a median probe spacing of 
1 kb (human) and 1.2 kb (mouse). 

Interspecies syntenic region mapping 

To construct full-chromosome maps of syntenic regions between 
two species, we mapped microarray probe coordinates from a 
master species to a slave species genome coordinates. In Figure 2, 
for example, the master species is mouse (Fig. 2A) and the slave 
species is human (Fig. 2B). To perform the actual mapping, we used 
the UCSC liftOver tool (http://hgdownload.cse.ucsc.edu/admin/ 
exe/) together with liftOver-compatible chain files. These files 
were obtained from UCSC (http://hgdownload.cse.ucsc.edu/ 
goldenPath/mm9/liftOver/) and converted to reciprocal best 
chain files using code based on Jim Kent's source tree (http:// 
hgdownload.cse.ucsc.edu/admin/jksrc.zip, doRecipBest.pl). The 
liftOver procedure was initiated from every probe in the master 
species. For each match, we tried to pair the original probe in the 
master species to a representative probe in the slave species. This 
was achieved by searching for the probe nearest to the matched 
region in the slave species, within a fixed-width window. The 
width of this window was determined by the median spacing be- 
tween matched regions in the slave species, which was 1905 bp in 
the case of the mouse-to-human mapping. During a final step, 
results were curated to remove ambiguous probe matches so as 
to retain only one-to-one matches. This yielded final species-to- 
species maps with a resolution of —2.5 kb, derived from original 
maps with a resolution of ~1 kb. 

Calling of NL interaction status 

We fitted a two-state hidden Markov model (HMM) whereby 
emissions are distributed as Student's t variables. The mean and 
variance of DamID signals differ between states, but the degree of 
freedom (nu) is the same. Gaps in the probe coverage were filled by 
evenly spaced null probe values. The parameters were estimated by 
an adaptation of the ECME algorithm to the HMM framework, 
showing faster convergence than regular EM when nu is unknown 
(Filion et al. 2010). State calls were derived through the Viterbi 
algorithm. This process was repeated separately for each cell type 
and species, yielding per-probe calls. Probes in the "bound" state 
are indicated as LAD-probes, probes in the "unbound" state as 
inter-LAD-probes. Note that in order to obtain a probe-by-probe 
readout of lamina status, we did not use the algorithm as used 
before (Guelen et al. 2008; Peric-Hupkes et al. 2010). 

Concordance scoring 

The HMM calls are the basis for the concordance score, which is 
defined as the percentage of all calls that is in agreement (LAD vs. 
LAD or inter-LAD vs. inter-LAD) between two data sets (e.g., two 
different cell types or species). Statistical significance of concor- 
dance scores was assessed by comparing scores against a null 



distribution constructed of concordance scores for 100,000 ran- 
dom circular permutations of calls. 

Definition of constitutive and facultative regions 

The HMM procedure described above was performed on data for 
all four cell types described in Peric-Hupkes et al. (2010). Regions 
that were in perfect agreement were termed "constitutive" 
(70.76%) and the remainder "facultative" (29.24%). For the 
subset of mappable regions between mouse and human, these 
numbers are 66.70% and 33.30%, respectively. We chose to use an 
HMM-based method to identify constitutive regions rather than 
the method described in Peric-Hupkes et al. (2010) because the 
latter has the opposite purpose of identifying differentially bound 
regions and was used to call these differences in a gene-centric 
fashion between pairs of cell types. Here, we were interested in 
regions showing qualitatively identical binding across multiple 
cell types. 

Gene annotations 

A compendium of gene annotations was constructed using 
annotations downloaded from the UCSC Table Browser. The 
annotations used are UCSC genes (table "knownGene"), Con- 
sensus Coding Sequence (table "ccdsGene"), RefSeq (table 
"refGene"), Vega (table "vegaGene"), Vega pseudogenes (table 
"vegaPseudoGene"), and Ensembl (table "ensGene"). Merged 
together, the compendium describes a total of 34,129 mouse 
genes, collectively covering 45.62% of the mouse genome. 

Coverage of genes, gene deserts, and LINE, A/T-rich, 
and SINE elements 

Gene deserts were defined as regions >500 kb containing no genes. 
LINE, simple A/T-rich, and SINE elements were obtained from the 
UCSC Table Browser. Coverage of cLAD, facultative, and ciLAD 
regions was obtained by calculating the overlap of these regions 
with genes, gene deserts, LINEs, A/T-rich elements, and SINEs. 
Error bars were obtained by performing 1000 random circular 
permutations of cLAD, ciLAD, and fLAD regions, calculating the 
overlap for these random regions and reporting two standard de- 
viations around mean random overlap numbers. 

Concordance between species in absence of genes 

The interspecies concordance of nongenic regions was assessed by 
removing mouse probes, and their representative human probes, 
that overlap at least 50% with any of the 34, 129 genes in the mouse 
gene compendium described above. 

Calling of chromosome breaks 

Breaks in synteny, i.e., chromosomal rearrangements, were called 
by finding transitions in reconstituted slave chromosomes. This is 
illustrated by transitions in color in Figure 2B and Supplemental 
Figure 1. We used a sliding probe-based window corresponding on 
average to 50 kb to identify transitions. Majority voting was used as 
a pre-processing step to remove spurious transitions. This resulted 
in a set of 152 major breaks for the full genome and 143 breaks for 
constitutive regions only. 

Chromosome breaks versus lamina status 

For each of the rearrangement loci within constitutive regions, we 
calculated the average mouse NL interaction status in a window 
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corresponding to on average 50 kb, centered around the break. The 
binary LAD/inter-LAD calls yielded by the HMM were used as in- 
dicators for lamina status. The result could naturally be clustered 
into three groups: LADs (24), LAD-borders (39), and inter-LADs (80), 
where the size of borders has an expected upper bound of —17 kb. 
Distributions of random overlap for the three groups were obtained 
by circularly permuting the positions of the 143 chromosome 
breaks, for all possible positions, and recalculating overlap. P- values 
were obtained by comparing the real overlap against these distri- 
butions. Note that this approach retains the relative genomic dis- 
tribution of cLADs, ciLADs, and LAD borders, thereby correcting for 
differences in genomic coverage of these three categories. The same 
analysis was performed for the full genome using ESC data, over 152 
identified chromosomal breaks, with similar results (see text). 

Homo /heterotypic analysis 

One-tailed Fisher's exact tests were performed to test whether 
breakpoints inside mouse cLADs are enriched for coinciding with 
human LADs. For the 143 breakpoints in constitutive regions, this 
is, indeed, the case (P = 1.126 X 10~ 7 ). For the 152 breakpoints in 
the full genome, using data for mouse and human ESCs, this also 
holds true (P = 3.56 X 10" 7 ). 

Definition of regions used for classification 

The regions selected for classification consisted of regions that share 
a common lamina status in mouse (i.e., either LAD or inter-LAD in 
all assayed cell types). These are obtained by extending microarray 
probe positions to 1-kb regions. Regions annotated in the mouse 
gene compendium are removed, including 2 kb upstream and 
downstream of genes, to disregard any gene-specific patterns that 
may be present in these regions. Regions smaller than 1 kb are also 
disregarded. The remainder of the regions are scored based on their 
mean ranked lamina association across the assayed mouse cell types. 
For facultative regions, the standard deviation of the ranked lamina 
association is used as the score. Scores are then used to order regions, 
i.e., high and low lamina associations for cLAD and ciLAD regions 
and high variation in lamina association for facultative regions. The 
top 1000 cLAD/ciLAD/facultative regions are used for subsequent 
analyses. Facultative LAD and inter-LAD regions specific to a par- 
ticular cell type (Fig. 4) were selected in a similar manner, with HMM 
state calls being specific to the cell type of interest and scores based 
on the level of lamina association of the same cell type only. For the 
assessment of phyloP conservation scores in cLAD, ciLAD, and 
facultative regions (Supplemental Fig. 3), we used 1000 1-kb regions 
for each group. Intronic regions were selected by size matching in- 
trons positioned in cLAD, ciLAD, and facultative regions, giving 
preference to large regions. The center 1 kb of each region was 
subsequently selected for assessing conservation levels. 

Construction of classifiers 

Genomic sequences were obtained for all selected regions, and the 
number of occurrences of fc-mers (k= 1 ... 5) was scored for each 
region. Support vector machine classifiers were trained using the 
el071 R-package with default parameters, using a radial basis ker- 
nel. Their performance was assessed using 10-fold cross-validation. 
Cross-validation accuracy [i.e., (TN + TP)/(TN + TP + FN + FP)] 
scores are the result of averaging accuracies over folds. 

Isochore analysis 

Mouse isochore definitions were obtained from Costantini et al. 
(2009), who report five types of genomic regions based on their 



G/C content (LI, L2, HI, H2, and H3). For Supplemental Figure 5, 
we have grouped LI and L2 in one high A/T-content class, and HI, 
H2, and H3 in one low A/T-content class. 



Analysis of A/T content versus gene density in ciLAD regions 

We calculated gene density in windows of 99 microarray probes, 
roughly corresponding to 100 kb, as the number of probes that 
overlap with an annotated gene. Per-probe A/T content was 
calculated within windows of 1 kb, centered on the genomic 
probe position. We then scored intergenic iLAD regions based 
on their gene density and/or A/T content being below or above 
the genome-wide median, resulting in an odds ratio. Ten thou- 
sand random circular permutations of A/T-content scores were 
performed to derive a null distribution of odds ratios. Compar- 
ing the original odds ratio to this null yielded a P-value of 
0.0986, indicating that there is not a significant difference in 
A/T content between gene-rich and gene-poor intergenic ciLAD 
regions. 



Adherence to A/T-content rule of cell-type specific LADs 

The classifier constructed based on A/T content for discriminating 
between cLAD and ciLAD regions was applied to regions that show 
a cell-type specific lamina association. In other words, these re- 
gions are LAD or inter-LAD exclusively for one particular cell type. 
The adherence to the A/T-content rule is reported by way of the 
classification accuracy, of which the variance is estimated using 
a bootstrapping procedure. The number of cell-type specific LAD/ 
inter-LAD regions was set to 1000, as in the cLAD/ciLAD classifi- 
cation procedure. 



Paralog analysis 

Within-species paralog information was obtained from BioMart 
(http://www.biomart.org; obtained on Oct 10, 2011). We in- 
cluded one-to-one as well as many-to-many paralogs. The mean 
DamID log 2 -ratio score was calculated for each gene and plotted 
against (each of) its paralog partners. For many-to-many paralogs, 
all gene pairs are separately represented in Figure 5 and Supple- 
mental Figure 6. A/T-content information for entire gene bodies 
was also retrieved from BioMart, and mouse gene expression data 
were obtained from Mikkelsen et al. (2007). To assess the statistical 
significance of the patterns observed in Figure 5 and Supplemen- 
tal Figure 6, we used the mean difference in A/T content or ex- 
pression for the upper-left quadrant as a test statistic. The null 
distribution was estimated by repeatedly (10,000x) randomly 
shuffling A/T content and gene expression values of genes, 
while keeping paralog pairings intact, and calculating the mean 
upper-left quadrant value for each iteration. In all cases, the ob- 
served means were more extreme than those expected under 
the null (P<1 X 10" 4 ). 



POU2F1 motif analysis 

We scanned the mouse genome for occurrences of the TRANSFAC 
(Matys et al. 2003) OCT1_01 motif, at a precision of 90% of the 
information content. We then aligned the differential signal 
obtained from subtracting DamID lamin Bl scores of POU2Fl +/+ 
cells from those of POU2Fl~'~ cells around the identified motif 
sites (Supplemental Fig. 8D). As a control, we performed the same 
alignment for control sites positioned exactly halfway in between 
OCT1_01 motif sites. 



278 Genome Research 

www.genome.org 



DNA sequence linked to chromosomal architecture 



Data access 

Human lamin A (HT1080) and lamin Bl (ESC and HT1080) data 
are available from the NCBI Gene Expression Omnibus (GEO) 
(http://www.ncbi.nlm.nih.gov/geo/) under accession number 
GSE22428. Mouse lamin A (NPC and AC) and lamin Bl (P0U2Fr / ~ 
and P0U2F1 +/+ MEFs) data are available from GEO under accession 
number GSE36132. 
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